We investigate whether three types of post hoc model explanations--feature attribution, concept activation, and training point ranking--are effective for detecting a model's reliance on spurious signals in the training data. Specifically, we consider the scenario where the spurious signal to be detected is unknown, at test-time, to the user of the explanation method. We design an empirical methodology that uses semi-synthetic datasets along with pre-specified spurious artifacts to obtain models that verifiably rely on these spurious training signals. We then provide a suite of metrics that assess an explanation method's reliability for spurious signal detection under various conditions. We find that the post hoc explanation methods tested are ineffective when the spurious artifact is unknown at test-time especially for non-visible artifacts like a background blur. Further, we find that feature attribution methods are susceptible to erroneously indicating dependence on spurious signals even when the model being explained does not rely on spurious artifacts. This finding casts doubt on the utility of these approaches, in the hands of a practitioner, for detecting a model's reliance on spurious signals.
translated by 谷歌翻译
Data-driven interatomic potentials have emerged as a powerful class of surrogate models for {\it ab initio} potential energy surfaces that are able to reliably predict macroscopic properties with experimental accuracy. In generating accurate and transferable potentials the most time-consuming and arguably most important task is generating the training set, which still requires significant expert user input. To accelerate this process, this work presents \text{\it hyperactive learning} (HAL), a framework for formulating an accelerated sampling algorithm specifically for the task of training database generation. The key idea is to start from a physically motivated sampler (e.g., molecular dynamics) and add a biasing term that drives the system towards high uncertainty and thus to unseen training configurations. Building on this framework, general protocols for building training databases for alloys and polymers leveraging the HAL framework will be presented. For alloys, ACE potentials for AlSi10 are created by fitting to a minimal HAL-generated database containing 88 configurations (32 atoms each) with fast evaluation times of <100 microsecond/atom/cpu-core. These potentials are demonstrated to predict the melting temperature with excellent accuracy. For polymers, a HAL database is built using ACE, able to determine the density of a long polyethylene glycol (PEG) polymer formed of 200 monomer units with experimental accuracy by only fitting to small isolated PEG polymers with sizes ranging from 2 to 32.
translated by 谷歌翻译
已显示在文本上训练的NLP模型可以重现人类的刻板印象,当系统大规模部署系统时,可以放大边缘化组的危害。我们适应了Koch等人的代理 - 信号 - 局势(ABC)刻板印象模型。(2016年)从社会心理学作为系统研究和发现语言模型(LMS)中刻板印象群体特征关联的框架。我们介绍了用于测量语言模型的刻板印象关联的灵敏度测试(集合)。为了使用ABC模型评估集合和其他措施,我们从美国受试者那里收集小组特征判断,以与英语LM刻板印象进行比较。最后,我们扩展了此框架以测量相互切换身份的LM定型观念。
translated by 谷歌翻译
人类机器人的互动会影响人类,这通常会改变行为。本文探讨了这种改变行为的外部性 - 偏好变化。它扩展了先前关于AI系统偏好变化的工作。具体而言,本文将探讨机器人的适应性行为如何通过社交互动来发挥影响力,从而改变用户的喜好。它认为,与其他普遍技术相比,机器人独特影响行为的能力,这一风险很高。因此,有说服力的机器人会冒着操纵的风险。
translated by 谷歌翻译
部署后,AI代理会遇到超出其自动解决问题能力的问题。利用人类援助可以帮助代理人克服其固有的局限性,并坚决应对陌生的情况。我们提出了一个通用的交互式框架,该框架使代理商能够从对任务和环境有知识的助手那里请求和解释丰富的上下文有用的信息。我们在模拟的人类辅助导航问题上证明了框架的实用性。在我们的方法中学到的援助要求政策的帮助下,导航代理与完全自主行为相比,在以前看不见的环境中发生的任务上的成功率提高了7倍。我们表明,代理商可以根据上下文来利用不同类型的信息,并分析学习援助要求政策的好处和挑战,当助手可以递归地将任务分解为子任务。
translated by 谷歌翻译
We survey 146 papers analyzing "bias" in NLP systems, finding that their motivations are often vague, inconsistent, and lacking in normative reasoning, despite the fact that analyzing "bias" is an inherently normative process. We further find that these papers' proposed quantitative techniques for measuring or mitigating "bias" are poorly matched to their motivations and do not engage with the relevant literature outside of NLP. Based on these findings, we describe the beginnings of a path forward by proposing three recommendations that should guide work analyzing "bias" in NLP systems. These recommendations rest on a greater recognition of the relationships between language and social hierarchies, encouraging researchers and practitioners to articulate their conceptualizations of "bias"-i.e., what kinds of system behaviors are harmful, in what ways, to whom, and why, as well as the normative reasoning underlying these statements-and to center work around the lived experiences of members of communities affected by NLP systems, while interrogating and reimagining the power relations between technologists and such communities. Anne H. Charity Hudley. 2017. Language and Racialization. In Ofelia García, Nelson Flores, and Massimiliano Spotti, editors, The Oxford Handbook of Language and Society. Oxford University Press. Won Ik Cho, Ji Won Kim, Seok Min Kim, and Nam Soo Kim. 2019. On measuring gender bias in translation of gender-neutral pronouns. In Proceedings of the Workshop on Gender Bias in Natural Language Processing, pages 173-181, Florence, Italy.
translated by 谷歌翻译
机器学习社区目前没有记录数据集的标准化过程,这可能导致高赌注域的严重后果。要解决此差距,我们提出了数据集的数据表。在电子行业,每个组件,无论多么简单或复杂,都附带了一个描述其操作特征,测试结果,推荐使用和其他信息的数据表。通过类比,我们建议每个数据集都附有一个数据表,这些表记录了它的动机,组成,收集过程,推荐用途等。数据集的数据表将有助于在数据集创建者和数据集消费者之间更好地沟通,并鼓励机器学习界优先考虑透明度和问责制。
translated by 谷歌翻译